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Abstract. Economical archival and retrieval of image data is becoming increasingly important 
considering the unprecedented data volumes expected from the Earth Observation System (EOS) 
instruments. For cost effective browsing the image data (possibly from remote sites), and 
retrieving the original image data from the data archive, we suggest an integrated image browse 
and data archive system employing incremental transmission. 

We produce our browse image data with the JPEG/DCT lossy compression approach. Image 
residual data is then obtained by taking the pixel by pixel differences between the original data 
and the browse image data. We then code the residual data with a form of variable length coding 
called diagonal coding. 

In our experiments, the JPEG/DCT is used at different quality factors (Q) to generate the browse 
and residual data. The algorithm has been tested on band 4 of two Thematic Mapper (TM) data 
sets. The best overall compression ratios (of about 1.7) were obtained when a quality factor ot 
Q=50 was used to produce browse data at a compression ratios of 10 to 1 1. At this quality factor 
the browse image data has virtually no visible distortions for the images tested. 


1. Introduction 

Economical archival and retrieval of image data is becoming increasingly important considering 
the unprecedented data volumes expected from the Earth Observation System (EOS) 
instruments. The challenges EOS present to the information scientist are providing a cost 
effective mechanism for: (i) browsing the image data (possibly from remote sites), and (n) 
obtaining the original image data from the data archive. We suggest that these two mechanisms 
be integrated, i. e., the lossless image data should be reconstructed from the browse image data 
by incremental transmission. 

The data archive's integrity is maintained as long as every bit of the original image data can be 
reliably reconstructed from compressed form without loss. Nevertheless, lossless compression is 
not very effective in reducing data volume. Maximum compression ratios of 2.0 to 2.5 are 
typical for the type of image data expected from EOS instruments. Lossy compression, on the 
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other hand, can typically provide compression ratios of as high as 30 to 50 without significant 
vi sijble degradation of the image data. However, because the original image data cannot be 
perfectly reconstructed from this highly compressed data, it can only be used for data browsing 
and, possibly, certain preliminary analysis. 

In most data archive schemes, highly compressed data is kept in on-line storage and used to 
efficiently browse the data to determine potentially useful data set(s) for further processing. 
Once this decision is made, the original data is obtained from off-line storage. The browse 
quality image data and the corresponding original image data contain redundant information, 
causing a fraction of the information to be transmitted twice. 

If incremental data is stored off-line instead of original data, data transmission to users can be 
made more efficient. In this approach the image data is decomposed into browse and residue so 
information is not duplicated either in data archival or in transmission to users across the 
computer networks. 

In this paper we address the problem of decomposing image data into browse and residual data 
in a manner that is most appropriate for image data archival. Browse data should take only a 
small fraction (typically 1/30 to 1/50) of the storage required for original data with quality that is 
adequate for deciding whether the data is useful or not for an intended application. The residual 
data, normally kept off-line, should have relatively high compressibility using a carefully 
designed lossless compression technique. Thus, the key problems are to select a lossy 
compression approach that provides the best compression with quality that is nearly lossless 
visually, and to select the most effective lossless compression approach for the residual. In 
addition, we also determine the browse data compression ratio that leads to the best overall 
compression. 


2. JPEG/DCT Approach for Browse Quality Image Generation 

Any of several lossy compression techniques, such as subband/wavelet coding and vector 
quantization, could be used to produce the browse quality image. We chose to use the 
JPEG/DCT lossy compression approach for the following reasons. 

i. The JPEG/DCT lossy compression approach has become an industry-wide standard 
compression approach. 

ii. Special hardware boards are available commercially for various machines including the 

ubiquitous IBM/PC. * ° 

iii. The image quality of the browse data can be fine tuned until it is visually lossless. 

JPEG lossy compression is based on the Discrete Cosine Transform (DCT) of 8x8 blocks of the 
input image [1-2], In the encoding process, the samples in the input image are grouped into 8 x 
8 blocks, and each block is transformed by the forward DCT (FDCT) into a set of 64 coefficients 
referred to as the DCT coefficients. The first coefficient corresponds to the DC coefficient, and 
the remaining 63 are AC coefficients. Each of the AC coefficients is then quantized using one of 
64 corresponding values from a quantization table. The DC coefficients of different blocks 
undergo differential coding. The AC coefficients are then ordered by a one-dimensional zigzag 
sequence. Finally, the quantized coefficients are compressed using either a Huffman table or 
arithmetic coding. 

The baseline JPEG/DCT accepts 8-bit images and uses two Huffman tables for coding DC and 
AC coefficients. However, the other JPEG lossy standards allow 8-bit to 12-bit precisions with 
either Huffman or arithmetic coding of coefficients. At the decoding end the 64 coefficients are 
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used to reconstruct 8x8 coefficient image which then is mapped back to image space by Inverse 
DCT (IDCT). 

JPEG/DCT approach provides a fine tuning factor, Q, which corresponds to different qualities of 
the compressed images. For typical NASA image data, a low value of Q, such as 20, provides 
high compression with poor image fidelity. As the Q factor increases the fidelity improves at the 
expense of compression ratio. For Q = 80, the compressed images are generally visually 
indistinguishable from the input images, with a compression ratio typically in the range of 6.0 to 
7.0. For data from the Landsat TM instrument, a general image quality rating for different Q 
values and corresponding compression ratios (C/j) is: 


Q C R 


Image Quality 


25 -40 25 - 12 

40-70 12-8 

70-80 8-6 

80-90 6-4 


moderate to good quality 
good to very good quality 
excellent quality 
indistinguishable from original 


Several EOS instruments are expected to have a dynamic range of 0 - 4095, that is, the pixel 
brightness level can be represented by 12 bits. However, the human perceptual system cannot 
even resolve 256 gray scale levels (/. e., a range of 0 - 255), which can be represented by 8 bits. 
Therefore, the first stage of producing the browse data can be described as follows: Determine 
the actual dynamic range of the data (which can be less than, but no more than 12 bits), and 
retain only the 8 most significant bits in that dynamic range. Then compress this 8-bit data with 
JPEG/DCT at the optimal quality factor. For lossless compression, the remaining bits, as well as 
the residual from JPEG compression are separately compressed using an appropriate lossless 
compression approach. Such an approach is described in the following section. 


3. Residual Compression using Diagonal Codes 

Residual image data is that which is obtained through taking the pixel by pixel differences 
between the original data and the image reconstructed after lossy compression. We have 
observed that the residual image data obtained from JPEG/DCT compression is low entropy data 
that is compressible to a greater degree than the original image data. The better the browse data 
approximates the original data, the more compressible is the residual image data. Thus, a better 
quality browse results in a residual that can be compressed better in lossless mode. 

However, a better quality browse image requires more bits per pixel. Since the overall lossless 
representation is sum of the bits per pixel for browse data and residual data, producing maximum 
overall lossless compression requires finding the optimal balance between the bits allocated to 
the browse data and the bits consequently required for the residual data. 

For remote browsing applications, the browse data bit rate (bits/pixel) must be kept very low to 
ensure efficient transmission of the data across the computer networks. This requirement leads 
to choosing the lowest JPEG/DCT quality factor without significant visual degradation of the 
reconstructed image data, which we have found to be a quality factor of about 50. Fortunately, 
our experiments have found that a quality factor of about 50 also corresponds closely to the 
browse bit rate that produces the optimal overall lossless compression in combination with the 
residual image data. 

The residual image data exhibits a Laplacian distribution with a smaller variance of data values 
than the original image data. This property suggests that a form of variable length encoding 
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would be most appropriate for lossless compression of this data. We have found specifically that 
a type of variable length encoding, called diagonal coding [3,4], is most appropriate. 

For images with n bits/pixel, straightforward representation of the residual image data requires 
n+\ bits. However, through using Golomb codes [5], the residual data requires just n bits/pixels 
(prior to diagonal coding). 

In our approach, the residual image data is divided into two parts. The first part contains the 
lower order two bits, while the second part contains the remaining higher order six bits. The 
frequency distribution of the lower order bits exhibits no particular structure, and thus can be 
compressed very little. However, the frequency distribution of the higher order bits exhibits a 
narrow Laplacian distribution. For this type of distribution. Rice, et. al., [3] have shown the 
diagonal code is asymptomatically optimal. In this code, each value is represented by number of 
zeros corresponding to that value, terminated by a one. For six bit data, the diagonal code for 
"000101" is "000001", and the diagonal code for "010100" is "000000000000000000001." 
Since higher values in the residual data occur less frequently, this code turns out to be optimal. 
This representation is very efficient for coding as well as decoding. 

The diagonal code we propose is as follows. The frequency distribution is divided into sets of 
four pixels centered about the zero axis such that each set contains two negative and two positive 
residual values except the first one that contains zero. If the residual value belongs to set 1, it is 
represented by 1, if the value belonged to set 2, it is represented by 01, if the value belongs to 
set 3, it is represented by 001, and so on. In general, if the residual value belongs to z 1 " set, the 
representation is series of /-I zeros followed by 1. Typical sets and their representations are 
shown below: 


Set 

Range 

Diagonal code 

1 

(-L0,1 ,2) 

= 1 followed by two bits for 

identification of actual value 

2 

(-2, -3, 2, 4) 

= 01 

3 

(-5, -4, 5, 6) 

= 001 

4 

(- 7,-6,7,8) 

= 0001 

5 

(-9, -8, 9, 10) 

= 00001 

6 

(-11,-10,11,12) 

= 000001 

7 

(-13,-12,13,14) 

= 0000001 

8 

(-15,-14,15,16) 

= 00000001 

9 

(-17,-16,17,18) 

= 000000001 

10 

(-19,-18,19,20) 

= 0000000001 

11 

(-21,-20,21,22) 

= 00000000001 

12 

(-23,-22,23,24) 

= 000000000001 

13 

(-25,-24,25,26) 

= 0000000000001 

14 

(-27,-26,27,28) 

= 00000000000001 

15 

(-29,-28,29,30) 

= 000000000000001 

16 

(-31,-30,31,32) 

= 0000000000000001 


4. Experimental Results and Conclusions 

We have tested our compression approach on band 4 of Landsat Thematic Mapper Images of 
Washington, DC and of Davidsonville, LA (northwest of New Orleans, LA). The browse data 
was generated using JPEG/DCT at quality factors of 25, 50, and 75. Table 1 shows the 
frequency distribution of residual data at these quality factors: 


10 



Table 1. Washington, DC residual data image statistics. 


Diagonal Code 
Set# 

Q = 25 

0 = 50 

0 = 75 

1 

.3393 

.4101 

.4911 

2 

.2556 

.2883 

.3055 

3 

.1794 

.1686 

.1381 

4 

.1102 

.0819 

.0487 

5 

.0599 

.0339 

.0127 

6 

.0311 

.0118 

.0030 

7 

.0138 

.0036 

.0005 

8 

.0060 

.0010 

.0001 

9 

.0026 

.0002 

.00004 


The compression performance of the algorithm is summarized in Tables 2 and 3 for the two data 
sets we have used in our experiments. For three different Quality factors, the browse 
compression ration (CR B ), the overall lossless compression (CR), and the ratio of CR to the first 
order entropy (CR e ) are tabulated. From the table we see that the best compression ratio in 
lossless mode corresponds to a quality factor, Q = 50. 


Table 2. Washington D.C. (Band 4) 


Q 

CR b 

CR 

CR/CR e 

25 

20.0 

1.63 

0.972 

50 

11.6 

1.67 

0.985 

70 

7.5 

1.64 

0.977 


Table 3. 

New Orleans (Band 4) 


Q 

CR b 

CR 

CR/CR e 

25 

16.1 

1.599 

.9198 

50 

10.3 

1.653 

.9901 

75 

6.9 

1.633 

.9774 


We have described a method of decomposing image data into a browse image and residual 
image data for active archival and distribution of data. We have found that a variant of diagonal 
code proposed by us gives the best compression ratio for a residual corresponding to the browse 
data generated by JPEG/DCT for a quality factor of 50. This quality factor provides browse 
quality that has very little visible distortions for the images tested. 
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